Engineering a Distributed Full-Text Index

نویسندگان

  • Johannes Fischer
  • Florian Kurpicz
  • Peter Sanders
چکیده

We present a distributed full-text index for big data applications in a distributed environment. The index can be used to answer different types of pattern matching queries (existential, counting and enumeration) and also be extended to answer document retrieval queries (counting, retrieve and top-k). We also show that succinct data structures are indeed useful for big data applications, as their low memory consumption allows us to build indices for larger slices of text in the main memory.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Clustered Distributed Index for Efficient Text Retrieval Using Threads

In this research paper, a novel method of improving the clustered distributed indices for efficient text retrieval using threads is presented. In text retrieval, text search refers to a technique of searching stored document or database. In a full text search, the search engine examines all the words in every stored document as it tries to match search words supplied by the user. When dealing w...

متن کامل

Clustering Full Text Documents

An index or topic hierarchy of full-text documents can organize a domain and speed information retrieval. Traditional indexes, like the Library of Congress system or Dewey Decimal system, are generated by hand, updated infrequently, and applied inconsistently. With machine learning, they can be generated automatically, updated as new documents arrive, and applied consistently. Despite the appea...

متن کامل

Using Google Scholar to Search for Online Availability of a Cited Article in Engineering Disciplines

Many published studies examine the effectiveness of Google Scholar (Scholar) as an index for scholarly articles. This paper analyzes the value of Scholar in finding and labeling online full text of articles using titles from the citations of engineering faculty publications. For the fields of engineering and the engineering colleges in the study, Scholar identified online access for 25% of the ...

متن کامل

A Two-Tier Distributed Full-Text Indexing System

The performance of indexing systems is very important for a search engine. Usually, indexing systems on large-scale clusters can provide high search efficiency, but it brings expensive hardware costs. The costs would be greatly reduced if a distributed indexing system runs on small-scale clusters connected by the Internet. Two current inverted file partitioning schemes: document partitioning an...

متن کامل

A Distributed Digital Library Architecture Incorporating Different Index Styles

The New Zealand Digital Library offers several collections of information over the World Wide Web. Although full-text indexing is the primary access mechanism, musical collections can also be accessed through a novel melody retrieval system. In offering this service over a three-year period, we have had to face many practical challenges in building, maintaining, and administering diverse collec...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017